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Abstract 

Agriculture plays a crucial role in supplying food for the population, as well as 
contributing significantly to the country's Gross Domestic Product (GDP) in India. 
For farmers to achieve higher yields and profitability, it is essential to select a crop 
based on soil parameters. To simplify this process, a system for crop 
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Automated Machine Learning is being used to simplify and speed up the process. 
A machine learning algorithm uses an automatic selection of algorithms, features, 
and hyperparameters to make predictions, which can result in more accurate 
results. This study examines various Automated Machine Learning frameworks 
and compares the accuracy scores of different crop recommendation systems. H2O 
and AutoGluon achieved the highest accuracy score of 92.0%. 
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1. Introduction 
India holds the top position globally in terms of net 
cropped area, with the United States and China 


prices for their agricultural commodities. Precision 
Agriculture (PA), an emerging field in computer 


following closely. The country plays a significant 
role in the global agricultural industry, with 58% of 
its population relying on agriculture as_ their 
primary source of income. India boasts several 
remarkable achievements in agriculture, including 
the largest herd of buffaloes, and extensive 
cultivation of different crops like wheat, rice, and 
cotton, and is the world's largest producer of milk, 
pulses, and spices. Artificial Intelligence, 
particularly through the implementation of Machine 
Learning (ML), offers effective solutions to the 
problems faced by farmers in agriculture. Major 
challenges faced by Indian farmers in agriculture 
are, farmers need to identify suitable crops that can 
thrive in their specific land or soil conditions. Next, 
there is a lack of automation in the crop cycle 
system. Lastly, farmers often struggle to obtain fair 
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science for agriculture, addresses these issues 
through the application of artificial intelligence 
(AI). PA involves the development of AlI-based 
tools and data-driven solutions, such as crop 
recommendations, fertilizer suggestions, and 
pest/disease detection, to enhance agricultural 
outcomes. ML algorithms can assist in selecting the 
most appropriate crop for specific farming land, 
taking into account soil parameters such as 
Nitrogen, Potassium, Phosphorus, and others. By 
making informed decisions regarding crop 
selection, farmers can achieve higher yields and 
increased profits. 

2. Related Works 

A comparative analysis [1] of the approach of 
Automated Machine Learning (AutoML), 
Conventional ensemble learning method and K- 
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Nearest Oracle AutoML model for predicting the 
dropout rate of students in sub-Saharan African 
countries. And results are KNORA AutoML system 
has given 97% accuracy, and 71% of precision 
better than the results of the Conventional ensemble 
model with 96% accuracy, and 70% precision. So 
AutoML models predict the rate is high. The system 
[2] can estimate crop yield and biomass using 
hyperspectral images. The estimation results are 
provided in the form of determination coefficient 
(R2) and Normalized Root Mean Square Error 
(NRMSE) metrics. Under various agricultural 
resource conditions, the implementation flexibility 
and learning cost can be reduced by utilizing the 
open-source system AutoML and an R language- 
based package. An _ efficient platform and 
framework [3] based on AutoML approach H20 to 
the estimation of Soybean and Corn seed protein 
and oil composition. The Gradient Boosting 
Machine(GBM) of H2O0O_ outperformed other 
algorithms for combination images given by 
Unmanned Aviation Vehicle(UAV) _ based 
hyperspectral and LiDAR(Light Detection and 
Ranging). It also investigated the model with crop 
images taken at different time points to analyse 
prediction results. AutoML-based system [4] for 
weed detection by considering two different 
datasets produced promising performance results 
with 93.8% and 90.7% F1 scores depending on the 
dataset. It also enabled a balance between AutoML 
and manual expert work to increase efficiency in 
plant protection. The developed model is evaluated 
with the original and noisy version dataset of early 
crop weed and plant seedlings. A comparative 
analysis [5] of time-series data like stock price, 
business development, weather, and economic 
status for prediction through traditional Machine 
Learning models and AutoML frameworks H20, 
AutoSklearn, AutoGluon, TPOT, Autokeras, 
EvalML, TransmogrifAI along with compare 
parameters and hyperparameters of concerning 
Machine Learning algorithms involved in AutoML. 
Classification system [6] for crops and weeds by 
two different datasets, dataset 1 collected images 
from the real world by using the agricultural robot, 
and dataset 2 is open. source available 
PlantVillage2. AutoML has chosen different CNN- 
based ensemble models with objective function 
Dual Metrics(DM) has given better results when 
compared with other models such as the ensemble 
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model with objective function No Miss Weed 
(NMW) and Categorical Cross Entropy(CCE). 
3. Methods 
For this study, a dataset [7] was acquired for the 
combined district of Kurnool, including Nandyal 
and Kurnool, located in the state of Andhra Pradesh. 
The dataset was obtained from an_ official 
government website. To ensure data quality, it is 
necessary to preprocess the dataset by removing 
duplicate and outlier values. Following the 
preprocessing step, the dataset consists of a total of 
67,788 data records, with 12 columns and 5,649 
rows. The dataset exclusively contains soil 
parameter values, including Potassium (K), 
Phosphorus (P), Nitrogen (N), pH, Electrical 
Conductivity (EC), Iron (Fe), Zinc (Zn), Sulphur 
(S), Organic Carbon (OC), Copper (Cu), 
Manganese (Mn), and Boron (B). 

3.1 Conventional Machine Learning 
The conventional process of developing Machine 
Learning models is manual, requiring substantial 
domain knowledge, consuming resources, and 
taking considerable time to yield predictive results. 
The conventional Machine Learning process 
involves sequential steps, as depicted in Figure | 
starting from Data Collection, Data Exploration, 
Data Preparation, Feature Engineering, Model 
Selection, Model Training, Hyperparameter 
Tuning, and Prediction. 


Data Collection 


Prediction 


| 


Hyperperameter 
Tuning 


Data Exploration 


Model Training Data Preparation 


Model Selection + Feature Engineering 


Figure 1 Process of Conventional Machine 
Learning 
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The key limitations of conventional Machine 
Learning are 
e It typically requires a high level of expertise 
in knowledge of feature engineering, 
algorithm selection, hyperparameter tuning, 
and model evaluation. 


e It often relies on manual feature 
engineering, where domain experts 
handcraft features based on _ their 


understanding of the problem. 

e Conventional Machine Learning requires 
manual tuning of hyperparameters, which 
can be challenging and time-consuming due 
to the large parameter space. 

e It relies on a limited set of popular 
algorithms that are well-known and well- 
understood by researchers and practitioners. 
Exploring alternative algorithms requires 
manual effort and expertise. 

e Conventional ML workflows are often not 
easily reproducible, as the manual nature of 
the process can lead to inconsistencies. 

e Conventional ML frameworks and tools 
may have a steep learning curve, making it 
difficult for non-experts to leverage ML 
techniques effectively. 

3.2 Automated Machine Learning 

The term "Automated Machine Learning," also 
known as “AutoML” [8] refers to the utilization of 
techniques, procedures, and frameworks to 
automate all or part of the Machine Learning 
pipeline. It provides pre-built components and 
resources to expedite and enhance the machine- 
learning process. AutoML streamlines the model 
development process by minimizing concerns about 
specific implementation details like 
hyperparameters, individual model selection, and 
other minor aspects that could impede progress. 
With AutoML, the creation of production-ready 
Machine Learning models becomes faster, simpler, 
and more efficient. AutoML aims to establish 
machine learning by making it accessible to users 
with limited expertise in data science or 
programming. It allows individuals from various 
domains to leverage the power of machine learning 
without the need to have an in-depth understanding 
of its intricacies. AutoML tools and platforms often 
provide user-friendly interfaces and workflows to 
streamline the machine learning process and enable 
faster development and deployment of models. The 
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steps involved in Auto-ML are illustrated in Figure 
2. 


Prediction +— _ Data Collection 


Data Exploration 


| 


Model Training Data Preparation 


Model Selection << Feature Engineering 


Figure 2 Process of Automated Machine 
Learning 
One of the key aspects of AutoML is data 
preprocessing. This involves handling various data- 
related tasks such as missing data imputation, 
outlier detection, and data normalization or scaling. 
AutoML algorithms automatically handle these 
tasks, ensuring that the data is in a suitable format 
for model training. Another important step is 
feature engineering. Traditionally, feature 
engineering requires domain experts to manually 
identify and create relevant features from raw data. 
However, AutoML algorithms can automatically 
select useful features from the available data or 
generate new features through transformations or 
combinations. This reduces the need for manual 
feature engineering and can potentially lead to more 
efficient and accurate models. Model selection is 
another crucial aspect that AutoML addresses. 
Instead of manually selecting and evaluating 
different models, AutoML tools explore a wide 
range of models, including both traditional 
algorithms and more advanced techniques like 
neural networks. They automatically evaluate the 
performance of these models on the given dataset 
and problem, helping users identify the best- 
performing model without requiring extensive 
knowledge of various algorithms. Hyperparameter 
tuning is a critical step in optimizing model 
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performance. AutoML automates this process by 
systematically searching the hyperparameter space 
to find the optimal configuration for a given model 
and dataset. Techniques like random search, 
Bayesian optimization, grid search, or genetic 
algorithms are employed to _ explore the 
hyperparameter space and find the best 
combinations efficiently. Furthermore, AutoML 
provides mechanisms for model evaluation and 
selection. It allows users to assess the performance 
of different models using evaluation metrics and 
validation techniques. This helps users make 
informed decisions about which model to choose 
for deployment based on their specific requirements 
and performance criteria. AutoML can be applied to 
various domains, including natural language 
processing, computer vision, and other deep 
learning frameworks. It automates the challenging 
tasks of selecting the appropriate model and 
improving its performance based on the provided 
data. AutoML simplifies the Machine Learning 
process, making it more accessible and resembling 
a black box. It automates various stages of the ML 
pipeline that involve applying algorithms to real- 
world situations. Typically, a human operator 
would require an understanding of the algorithm's 
internal logic and its practical application. Some of 
the AutoML frameworks such as AutoKeras, 
AutoSklearn, AutoGluon, Pycaret, H20, MLBox, 
AutoWeka, and TransmogrifAI. 
3.2.1 H20 
H20 [9] is open source and created by H20.ai and 
it is distributed in-memory Machine Learning 
platform. R and Python are both supported by H20. 
It supports the most popular statistical and Machine 
Learning methods, such as deep learning, 
generalised linear models, gradient-boosted 
machines, etc, H2O employs its algorithms to build 
pipelines and contains a module for AutoML. 
Pipelines are optimised using a thorough search for 
feature engineering techniques and hyperparameter 
tuning. H2O automates major activities like feature 
engineering, model selection, model deployment, 
and hyperparameter tuning, which are a few of the 
complicated tasks to make Machine Learning 
models effective. Additionally, it automated 
visualisation and Machine Learning interpretation. 
3.2.2 Auto-Keras 

DATA Lab created the open source software called 
Auto-Keras [10] library for automatic Machine 
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Learning (AutoML). Auto-Keras offers tools for the 
automatic search and selection of deep learning 
model architecture and hyper-parameters. It is 
simple to use since it adheres to the traditional 
Scikit-Learn API architecture. The latest version 
can do deep learning while automatically looking 
for hyperparameters. In Auto-Keras, the trend is to 
automate Neural Architecture Search (NAS) 
techniques to simplify Machine Learning. NAS 
employs a series of algorithms that change models 
automatically in place of deep learning practitioners 
and engineers 

3.2.3 TPOT 
TPOT [11] is a fully AutoML model and it is 
positioned as a platform to simplify the regular 
Machine Learning processes. A genetic algorithm is 
employed to identify the best model. The greatest 
predicted accuracy across a wide range of models is 
being chosen. This framework is a scikit-learn add- 
on, similar to Auto-Sklearn. However, TPOT 
follows its algorithms for classification and 
regression 

3.2.4 Dataiku 
Dataiku is a well-designed framework to perform 
automation majority of Machine Learning phases 
without having any prior programming or Machine 
Learning experience. This AutoML model quickly 
and accurately constructs prediction models. 
Machine Learning models may be easily created 
with Dataiku's user interface. A business may 
quickly implement a real-time predictive analytics 
solution that is powered by a precise Machine 
Learning model. The ability to dive deeper into the 
platform and take charge of the Machine Learning 
workflow is a huge benefit of Dataiku; on the one 
hand, business analysts can use it as a tool, and on 
the other hand, skilled data scientists can tune many 
parameters on their own to get even more accurate 
models 

3.2.5  AutoGluon 
Developed by Texas A&M University, AutoCluon 
[12] is an advanced framework for Automated 
Machine Learning (AutoML). Its primary objective 
is to streamline the construction and 
implementation of machine learning models by 
automating key steps like feature engineering, 
hyperparameter tuning, and model selection. By 
leveraging a robust search algorithm, AutoCluon 
efficiently explores a wide range of potential 
machine learning pipelines, identifying the most 
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optimal combinations for enhanced performance machine learning accessible to individuals with 
3.2.6 Pycaret varying levels of experience, as it significantly 
PyCaret is a Python library that simplifies the reduces the amount of code and effort needed for 
machine learning workflow process by offering a the implementation of work. 
high-level interface. It is an open-source, simple 3.2.7 Auto-ML framework comparison 
and low-code solution that automates various steps Various Auto-ML frameworks were analysed with 
in the process of building and deploying machine respect to basic working principles and supported 
learning models. PyCaret's main goal is to make Machine Learning algorithms (Table 1). 


Table 1 Various Auto-ML Frameworks Methodology 


AUTOML WORKING METHOD SUPPORT OF MACHINE LEARNING 
FRAMEWORK ALGORITHMS 
H20 returns the best performance Support Vector Machine, XGBoost, 
120 from the leaderboard of all models | Stacked Ensembles, Naive Bayes, Gradient 
based on the training of two Boosting Machine, Generalized Linear 
stacked ensembles. Model, Random Forest, and Deep Learning 
Neural Architecture Search (NAS) Deep Learning and simple Machine 
is a method that makes use of the Learning models 
Aacieerad Keras API and searches over neural 
network architectures to determine 
which one would best handle a 
modelling problem. 
Trains a group of models under K-Nearest Neighbour, Light Gradient 
various conditions and parameters, | Boost Machine, Random Forest, XGBoost, 
then chooses the most effective Extremely Randomized Trees, Neural 
ones by optimising hyper- Networks 
AutoGluon parameters. A random search, grid 
search, or Bayesian optimisation is 
the basis for the search method for 
the ideal collection of parameters. 
Pycaret follows step by step XGBoost, Light Gradient Boost Machine, 
process starting from data Gradient Boost Classifier, AdaBoost 
Pycaret preparation, model training, Classifier, Random Forest, Extra Trees, 
hyperparameter tuning, model Logistic Regression, K-Nearest Neighbour, 
analysis, model selection, and best | Support Vector Machine, Naive Bayes, and 
results of all models. others 
TPOT used genetic algorithms to XGBoost, Decision Tree, logistic 
TPOT automate the design of Machine regression, random forest, and KNN 
Learning models and optimize the 
Machine Learning pipeline. 
Dataiku makes Machine Learning XGBoost, Decision Tree, Light Gradient 
accessible by designing fully Boost Machine, Gradient Boost Classifier, 
automated models in the form of | AdaBoost Classifier, Random Forest, Extra 
Dataika feature extraction, feature selection, Trees, Logistic Regression, K-Nearest 
and model selection. It enables the Neighbour, Naive Bayes, Single Layer 
selection of models and Perceptron, Support Vector Machine, and 
hyperparameters either manual or others 
automatic. 
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4. Results and Discussion 
4.1 Results 

Each AutoML framework is applied to the dataset 
to evaluate the prediction results. Framework 
results are shown in Table 2 along with the 
particular best-performed Machine Learning 
algorithm. These are aligned with the result which 
is given by traditional Machine Learning models 
such as XGBoost. Accuracy is one of the evaluation 
metrics in Machine Learning. It means the correct 
prediction rate by the model. 


TP+TN 


Accuracy = ——————_ 
Y ~ TP+EP+TN+FN 


() 


Where TP means True Positive, TN means True 
Negative, FP means False Positive, and FN means 
False Negative. 


Table 2 Accuracy of AutoML Frameworks 
S.NO AUTOML ACCURACY 


FRAMEWORK 

1 H20 92.0% by Stacked 
Ensemble 

2 AutoKeras 56.4% by Deep 
Learning 

3 AutoGluon 92.0% by Random 
Forest with Gini index 

4 Pycaret 91.1% by XGBoost 

> TPOT 72.6% by Random 
Forest Classifier 

6 Dataiku 91.1% by — Light 
Gradient Boost 
Machine and Gradient 
Boosting Tree 


4.2 Discussion 
H20 has produced an accuracy of 92.0% by the 
Stacked Ensemble model. An accuracy of 92.0% 
was also given by the Random Forest Classifier 
with the Gini index of AutoGluon. Both Pycaret and 
DataRobot have given the same accuracy of 91.1%. 
Those are provided with various facilities like a 
selection of particular ML algorithms to be part of 
prediction to make a better analysis of predicted 
results. The Random Classifier of TPOT has given 
an accuracy of 72.6%. Deep Learning of AutoKeras 
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has given an accuracy of 56.4%. Figure 3 shows that 
algorithms of AutoGluon and H20 give better 
accuracy over remaining AutoML frameworks. 


Accuracy of Automated 
Machine Learning frameworks 


92.00% 


ACCURACY 


0% 50% 100% 


= DataRobot = TPOT 
AutoGluon ® AutoK eras = H20 


= Pycaret 


Figure 3 Accuracy of Automated Machine 
Learning Frameworks 


Conclusion 

The process of recommending a crop using 
Machine Learning can be complex and time- 
consuming, requiring several steps from data 
collection to the final model prediction result. To 
overcome this, the incorporation of AutoML, this 
process can be simplified into a more 
straightforward, efficient, and effective method. 
AutoML frameworks are designed to automate the 
selection of algorithms and features, as well as tune 
hyperparameters, to produce accurate crop 
recommendations based on soil parameters. By 
leveraging AutoML in agriculture, farmers can 
make more informed decisions regarding crop 
selection, resulting in higher yields and profits. 
Overall, AutoML has the potential to revolutionize 
the agricultural industry, making it automated, more 
efficient, sustainable, and profitable for all 
involved. 
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